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One of the main research areas in Bayesian Nonparametrics is the proposal and study of priors which 
generalize the Dirichlet process. Here we exploit theoretical properties of Poisson random measures in 
order to provide a comprehensive Bayesian analysis of random probabilities which are obtained by 
an appropriate normalization. Specifically we achieve explicit and tractable forms of the posterior 
and the marginal distributions, including an explicit and easily used description of generalizations of 
the important Blackwell-MacQueen Polya urn distribution. Such simplifications are achieved by the 
use of a latent variable which admits quite interesting interpretations which allow to gain a better 
understanding of the behaviour of these random probability measures. It is noteworthy that these models 
are generalizations of models considered by Kingman (1975) in a non-Bayesian context. Such models are 
known to play a significant role in a variety of applications including genetics, physics, and work involving 
random mappings and assemblies. Hence our analysis is of utility in those contexts as well. We also show 
how our results may be applied to Bayesian mixture models and describe computational schemes which 
are generalizations of known efficient methods for the case of the Dirichlet process. We illustrate new 
examples of processes which can play the role of priors for Bayesian nonparametric inference and finally 
point out some interesting connections with the theory of generalized gamma convolutions initiated by 
Thorin and further developed by Bondesson. 


1 Introduction 

A key problem in Bayesian nonparametric inference is the definition of a prior distribution on the 
space of all probability measures. Starting from the papers by Ferguson (1973) and Freedman (1963), 
in which the celebrated Dirichlet process has been introduced, various approaches for constructing 
random probability measures, whose distribution acts as a nonparametric prior, have been under¬ 
taken. They all aim at providing generalizations of the Dirichlet process. Among them we mention 
the neutral-to-the-right random probability measures due to Doksum (1974), which are obtained via 
an exponential transformation of an increasing process with independent increments, and the Polya- 
tree-priors thoroughly studied by Mauldin, Sudderth and Williams (1992) and Lavine (1992), which 
arise by considering suitable urn schemes on trees of nested partitions. Moreover, it is well-known 
that the Dirichlet process can be defined by normalizing the increments of a gamma process. The 
idea of constructing random probability measures by means of a normalization procedure has been 
exploited and developed in a variety of contexts not closely related to Bayesian inference. Indeed it 
has found many interesting applications: Kingman (1975) and Janson (2001) for storage problems 
and applications to computer science; Ewens and Tavare (1995) and Grote and Speed (2002) for 
population genetics; Engen (1978) and McCloskey (1965) for ecology; Derrida (1981) and Ruelle 
(1987) for statistical physics; Donnelly and Grimmett (1993) and Pitman (2002) for combinatorics 
and number theory; Pitman (1997) and Pitman and Yor (1997) for excursion theory. 

Kingman’s (1975) paper suggests that one can construct random probability measures as fol¬ 
lows. First take the ranked points of a homogeneous Poisson process on (0, oo), say Ai > A 2 ,..., 
such that their sum Y,._ 1 A, is finite and positive almost surely. Use these points to construct a 

1 AMS 2000 subject classifications. Primary 62G05; secondary 62F15. 
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sequence of probabilities, {Q t = AA;). Independent of this sequence choose (Zi) to be an 
iicl sequence of random elements of a Polish space with common distribution, say H. A random 
probability measure is then formed by Qi&z, ■ Two special cases include the Diriclilet process 
[Ferguson (1973)] and a class of random probability measures based on the stable law discussed in for 
instance Pitman (1996). It is interesting to note that much of the analysis related to such quantities 
focuses exclusively on the behaviour of the sequence ( Qi ). Such studies were carried out for instance 
by Pitman, Perman and Yor (1992) and Pitman (2003). Note that because of the independence 
of ( Qi ) and (Zi), the distinctive features of different classes of random probability measures are 
in fact deduced exclusively from the analysis of Qi. However with the exception of the Dirichlet 
process and those models based on a stable law, the analysis related to the bulk of these processes 
has yet to yield tractable results suitable for practical implementation in Bayesian nonparametric 
problems. This is due in part to the fact that the focus in Kingman (1975) and the majority of 
the subsequent analysis involves considerations other than Bayesian applications. Thus the issues of 
general tractability, in terms of their possible usage in a Bayesian context, raised by Adrian Smith 
and others in Kingman (1975), remains open. 

In this paper we provide an analysis, for a larger class of models, with a view toward practical 
implementation and a better theoretical understanding of such models in a Bayesian context. Using 
the Dirichlet process as a benchmark, such an analysis requires a suitable description of the posterior 
distribution, analogous to that given by Ferguson (1973). Additionally, noting the success of the 
Dirichlet process in complex mixture models, we find tractable analogues of the Blackwell-MacQueen 
Polya Urn and accompanying Chinese restaurant distribution which is related to the Ewens sampling 
formula derived by Ewens (1972) and Antoniak (1974). This then paves the way for the description 
of implementable MCMC and SIS computational procedures to approximate efficiently posterior 
quantities in applications such as hierarchical mixture modeling. Also, the exchangeable marginal 
distribution is equivalent to the notion of the moment measures of a random probability measure 
and hence is basic to the understanding of its theoretical properties and to the calculation of higher 
order moments. Of course, as a natural byproduct, our analysis has implications in the related non- 
Bayesian contexts described above. Our methodology follows the paradigm laid out in James (2005a, 
2002). James (2005a) points out that, in analogy to the use of classical Bayes rule, one often has 
to introduce additional refinements to obtain the most tractable forms. Here we show that such 
refinements are achieved by the use of an important exploitation of a latent structure derived from 
the gamma identity. 

We shall actually analyze a richer and more complex generalization of the model suggested by 
Kingman (1975), which have recently arisen in a Bayesian context. Regazzini, Lijoi and Priinster 
(2003) consider random probability measures obtained by a normalization of suitably time-changed 
increasing processes with independent and not necessarily stationary increments. Their interest 
was primarily the formidable problem of the determining the distribution of mean functionals. 
That is a generalization of the important body of work initiated by Cifarelli and Regazzini (1990). 
James (2002), using an approach closely connected to Perman, Pitman and Yor (1992), considers 
a more general /i-biased variation of the random probability measures which allows for an exten¬ 
sion to arbitrary Polish spaces. The constructions coincide on Euclidean spaces when h(s) = s , for 
s £ (0, oo). In any case, such models can also be represented in terms of corresponding (Qi) and 
(Zi). However, now these two sequences are not necessarily independent, meaning one cannot just 
analyze the (Qi). Moreover the (Qi) are now generated by an /i-biased structure. 

We also provide a more specific analysis of three classes of random probability measures which 
are somehow connected to the Dirichlet process, and hence inherit many of its desirable features, 
but are otherwise much more general. Specifically we define and examine a class of dependent 
Dirichlet processes, a class of random probability connected to the beta process. Lastly, using the 
/i-biased framework, we construct random probability measures based on the theory of Generalized 
Gamma Convolutions (GGC), initiated by Tlrorin (1977, 1978) and provide a detailed analysis. 
Some interesting features of this construction, is that we are able to embed a very large class 
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of random probability measures within a tractable framework of models which are extensions of 
Dirichlet process mean functionals. More precisely, these are models derived from a gamma process 
with a shape measure which is sigma finite and therefore has possibly infinite total mass. Our 
construction includes, for instance the stable law random probability measure and hence the entire 
class of the two-parameter Poisson-Diriclrlet process. Moreover our results and discussions are 
strongly suggestive of a more synergistic interplay between the theory of GGC and the study of the 
laws of extensions of mean functionals of the Dirichlet process. 

We next describe the formal construction and discuss the key features of our analysis. 


1.1 Preliminaries 

Let N denote a Poisson random measure on an arbitrary Polish Space SC x SC, with mean intensity 

E [N(ds, dx)\ = v(ds, dx). 

Denote the law of N as P(-|z/) with E[- | v\ denoting expectation with respect to P. The distribution 
of N is characterized by the Laplace functional 


(1) 


E 


a -JV(s) 


= exp 




(i 


- e~ 9{a 


■*) z/(ds, dx) 


where N(g) = ^ g(s,x)N(ds,dx) and g is any positive function. Throughout, as in Daley 

and Vere-Jones (1986), we have that N and all related functionals take their values in the space 
of boundedly finite measures on SC x SC, say M. A measure, say A, is boundedly finite if for each 
bounded set A , A(A) < oo. According to the decomposition of the intensity measure, we distinguish 
the two following cases: 

(i) if v(As, dx) = p(ds) H (dx) we say that N and its related functionals are homogeneous 

(ii) if j/(ds, dx) = p(ds|x) ? 7 (dx) = To(dx|s)p(ds), N and its related functionals are non-homogeneous. 

Where H and To are probabilities on SC, rj is a finite measure on SC. The quantity, p(ds\x) is an 
inhomogeneous Levy measure for each fixed x. The quantity, 


p(ds) = / p(ds\x)r/(dx) 

is a then a homogeneous Levy measure on SC. Let (Ji, Zi) denote the sequence of points of N on 
SC x SC. Using the decomposition F 0 (dx\s)p(ds), it follows that now for each i, the conditional 
distribution of Zj given Ji is 


F [Zi G dx\ Ji, J2, ...):= P (Zi G dx\Ji) = F 0 (dx\Ji). 

Moreover the marginal distribution of the sequence ( Ji) are the random points in of a homogeneous 
Poisson random measure with mean intensity p(ds). See for instance Resnick (1987, section 3.3.2). 

Let h denote a strictly positive function on dZ. Now define a random measure on SC, which is 
representable in distribution as, 

p(dx) = / h(s)N(ds,dx) 

such that its total mass T := p{&) = h(Ji), is strictly positive and almost surely finite. This 
happens if x SC) = +00 and the Laplace transform 

0(A) =E[e" AT ] =e~^ x) 
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where 

ip(X) := f (l-e~ xh ^)u(ds, dx) 

Jsrxy’ 

is finite for any positive A. We now can define a class of random probability measures on SC 
representable in distribution as, 

( 2 ) p(dx) = = ^pl = f2 Qifiz, = f; ^ s Zi 

i=1 i=1 

We will call the class of P defined as 0 . normalized random measures or NRMs. In Regazz- 
ini et al. (2003) an analogous construction, with SC = (0, +oo) and SC = R, has been developed by 
normalizing the increments of an increasing additive process. 

Remark 1. Note that the generality of SC allows for quite complex spaces. For example, one 
could take 5? to be the space of probability measures. Or SC could denote the space containing the 
sample paths of Brownian Motion or more general stochastic processes. See Perman, Pitman and 
Yor (1992) for an example involving excursion spaces of Markov processes. 

Remark 2. Pitman (2003) provides an important extension of the class of homogeneous NRM, 
defined by h(s) = s £ (0, oo), which can be defined as follows. For a homogeneous p, denote the law 
of the sequence ( Ji/T ) as PK{p) Furthermore, define the law of (J;/T)|T = t, as PK(p\t), where 
T has density Then this class generates the class of laws on (Ji/T), given by PK(p;jt) = 

/ 0 °° PK(p\t)'yT(dt), where 7 t is some density of T. An interesting special case is the choice of 
7 t(J) oc t~ q fx(t ), which, with the exception of the Dirichlet process, yields the two-parameter 
Poisson Dirichlet process when fr is the density of the stable law of index 0 < a < 1. The idea of 
conditioning on T is discussed briefly in Kingman (1975). In order to capture models like this and 
more general processes we can instead work with a weighted Poisson law of the type g(N)P(dN\u) 
where we assume that J M g(N)¥(dN\u) = 1. Formal details are given in section 8.3. 


2 Posterior Analysis 


Similar to Ferguson (1973) we shall consider the classical setup. Let (X n ) n >i be a sequence of SC- 
valued exchangeable random elements, i.e. such that for any n > 1 the .. ., X n are, conditional 
on P, iid with common distribution P. Suppose, also, that P is an NRM. This yields a description of 
the joint distribution of (X, P). We are interested in obtaining its description in terms of a posterior 
distribution of P|X and the (exchangeable) marginal distribution of X, say Since the law of P 
is dominated by the law of N it follows that the posterior distribution of P|X is determined by the 
posterior distribution of N |X. Moreover ./$ can be expressed as, 

(dx 1 ,.. •, dx n ) = / 

J M 

and is the general analogue of the Blackwell-MacQueen Polya Urn distribution. We recall from 
James (2005a) that there is a one to one correspondence between X and (Y, p), where using notation 
similar to Lo (1984), Y = (Y \,..., Y„( P )) denotes the distinct values of X and p = {Ci,..., C n ^} 
stands for a partition of the integers { 1 ,.. .,n} of size n(p) < n recording which observations are 
equal. The number of elements in the j-th cell, Cj := {i : X, = Y)}, of the partition is indicated 
by 6 j, for j = l,...,n(p), so that e j = n ■ When it is necessary to emphasize a further 

dependence on n, we will also use the notation ey n := e^. It follows that the marginal distribution 
of X can be expressed in terms of a conditional distribution of Xjp, which is the same as a conditional 
distribution of the unique values Y|p, and the marginal distribution of p. The marginal distribution 
of p, denoted as 7 r(p) or p(e 1 ,..., e ra ( p) ), is an exchangeable partition probability function (EPPF). 


IQ P(dxi) 


i =1 


F(dN\v) = [ T~ 

J M 


Q / h(si)N(dsi,dXi) 


.2 — 1 


/S’ 


P(dN\u) 
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Remark 3. A detailed general discussion of the EPPF concept may be found in Pitman (2002). 
A discussion of its role in a Bayesian context pertinent to our homogeneous models, which are a 
special case of species sampling models, may be deduced from Pitman (1996) and Ishwaran and 
James (2003a). Its role for inhomogeneous models is a bit different. See James (2005a) for the role 
of the EPPF, and that of ./#, in this more general Bayesian context. The most well-known EPPF 
is the Chinese restaurant distribution associated with the Dirichlet process. 


2.1 Posterior Distributions 

Now, similar to Perman, Pitman and Yor (1992, section 4) let, N n ^ p ) := N — YJj=i $Jj, n ,Yj denote 
a random measure after the first n(p) pairs of unique points (. Jj, n ,Yj ) in S* x are picked from 
N by ft.-biased sampling. Define No := N and note that the law of N n ( p ), depends on the random 
variable n(p). For each n it follows that N = lV„( p ) + ^Jj „,y, - From this one may define 

n(p) n(p) 

Rn( p) d ^ and Tn(p) = Mn(p)(^0 —T y ' 

j= 1 3 = 1 


Now, crucial to our exposition, for each n define the random variable U n = T n /T, where r n denotes 
a gamma random variable with shape n and scale 1 which is independent of N and hence T. 
Throughout our exposition T n will always be taken to be independent of every random variable 
except for U n . It will be shown in the appendix, that the appearance of U n arises from an application 
of the Gamma identity, 

1 r°° 

(3) T~ n = — / u n ~ 1 e~ uT du 

r W Jo 

It turns out that importantly, U n is also intimately connected to the real inversion formula for 
the Laplace transform. This is discussed in Feller (1971, VII.6) and plays a prominent role in the 
analysis of infinite-divisibility by Thorin (1977,1978) and Bondesson (1979, 1992). We will discuss 
some features of this in the forthcoming sections. 

Define for each integer l , cumulants , 

T i(u\y)= [ [h(s)] l e~ uh( - s) p(ds\y) and ki(u) = [ [h(s)] 1 e~ uh< - s) p(ds) 

Jy J s' 

Define conditional distributions of the ( J-j.n) given (U n , X), as 


O) 


P«,» € *1 Y„u) = for j = h n(p) 

T e • \U\Ij) 


Note that, conditional on U n , each ,Jj, n only depends upon X through ( ej,Yj ). In the homogeneous 
case their distributions no longer depend upon Yj and correspond to the distribution of {Jj,n)\U n , p 
expressible as P(Jy n £ ds\u) oc [h(s)] ej e~ us p(ds). We note further that 


E [h(Jj t n)\u, X] 


r ej , n {u\Yj) 


and E[h(Jj >n )\u, p] 


^l+e,-,n (u) 

(u) 


We now use these variables to describe the relevant posterior distributions and related quantities. 


Theorem 2.1 Let P denote a NRM defined by the Poisson random measure N with mean intensity 
v{ds,dx) = p(ds\x)r](dx). Let X = (Xi,... ,X n ) denote a vector of random elements on a Polish 
space such that X \,..., X n \P are iid P, then the following results hold. 
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(i) The posterior distribution of N, given (£/„,X), coincides with the conditional distribution of 
the random measure N* = 7V„( p ) + 8(j. ni y j ), where conditional onU n = u and X, N n ( p ) 

is a Poisson random measure with intensity 

(5) u u (ds,dx) = e~ uh ^ p(ds\x)r](dx) = e~ uh ^ F 0 (dx\s)p(ds), 

not depending on X, except through U n . 

(ii) Additionally, the {Jj, n ) ore, conditional on (Un,^), independent of N n ( p ) and are mutually 
independent with each Jj iH having the distribution, specified in 0 

(Hi) The posterior density of U n |X, is equivalent to the density of T n /\T n ^ + 

which is fu n (u\X.) oc u n ~ 1 e~^ v ^ T ej (u\yj). Similarly, the density ofU n \p is fu„i u Ip) ex 

u n-l e -Hn) Y\^K e .(u). 

Since p, and P are functionals of N the next two results follow immediately. 

Proposition 2.1 The posterior distribution of p\U n ,X is equivalent to the conditional distribution, 
given U n , X, of the random measure p^(dx) = r sN n(ds, dx) = p n{p) (dx) + h{J h n)8 Yj (dx), 

where conditional on U n and X, p n ( p ){dx) := / 0 °° sN n ^(ds,dx) is a completely random measure 
with Levy measure specified in 0- This implies that the density o/T n ( p ) \U n = u, X is fT n(p) {t\u) = 
« r ut e ^u)f T {t). 

Proposition 2.2 The posterior distribution of P |X, is equivalent to the conditional distribution of 
the random probability measure, 


n(p) 

Pn(dy) = Rii(p)Pn(p) (dy) + 'y ' Qj, n 8Yj (dy), 

j =i 

where P n (p)(dx') Pn(p)idx)/T n ^ p ^, Pn( p) Pn(p)/P 1 ^2/j —i Qj,n and Qj^n d(^Jj n ^)/T for 

j = 1,..., n(p). The distribution of all quantities, given (t/ n ,X), is specified by Theorem 2.1. 

We now provide an initial description of . 

Proposition 2.3 The exchangeable marginal distribution ^ of the observations X can be repre¬ 
sented as 


(6) ^K(dxi,... ,dx n ) = 


The corresponding EPPF is 


/» + OO 


r(n) 



n(p) 



i n ~ l 

n ^ 

M yj) 

e~ y 


_i=i 



r+oo 

n(p) 

1 

L 

u"- 1 

n ^ 

*(«) 


_i=i 



n(p) 

n 

.7=1 


u) du. 


Remark 4. In the case where h(s) = s £ (0, oo), the EPPF appearing in Proposition 2.3 
was first obtained by Pitman (2003, Corollary 6). Note that EPPF’s appearing in Proposition 2.3 
are examples of infinite EPPF’s. In the next section we shall show that they may be represented 
as a mixture of tractable finite EPPF’s. The distinctions between finite and infinite EPPF’s are 
described in Pitman (2002, Section 2). 
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3 Analysis of the exchangeable marginal distribution 

In this section we present a simpler description of the marginal distribution and the correspond¬ 
ing EPPF. Such a description facilitates its practical implementation and indeed has theoretical 
implications as well. Note that ideally one would like an EPPF of the form 

n(p) 

(7) P(fi 1) • • • > Gl(p)) = U n ,n(p) |1 Wej ■ 

i=1 

where here v n ^ p ) is a positive quantity only depending on n and n(p), and the w e , are a sequence 
of positive numbers each only depending on e ; - for j = 1,..., n(p). One aspect of such a represen¬ 
tation is that it is easily sampled according to variations of general Chinese restaurant processes. 
Pitman (2002) refers to such EPPF as having Gibbs form. However, it is known [see Pitman (2002), 
Theorem 42, p. 81] that the only infinite EPPF admitting such a representation are the EPPF’s de¬ 
rived from a Dirichlet process and those derived from a Stable law of index 0 < a < 1. Among them, 
we mention the two-parameter Dirichlet process and the generalized gamma class of processes. Here 
again we show that the appropriate usage of the random variable U n , leads to tractable descriptions 
of the and the EPPF. 

These simplifications are deduced from the following joint distribution of 
(p, T n ( P ), U n , Y) given by P(p, T n(p) G dz, U n G du, Y G dy) equal to, 


( 8 ) 


1 

rR 


w n-1 e - “ 2 


n(p) 

fr(z)dudz n (u\yj)ii(dyj) 

i=i 


This distribution appears naturally in our derivation of the posterior distribution given in the ap¬ 
pendix. Now, for fixed u > 0 and p, let 


(9) 


P {Yj G dy\U n 


u, p) := H jyn (dy\u) = 


K e d , n {u) 


j = l,...,n(p) 


Theorem 3.1 Let X denote the random, variables with the exchangeable distribution ^ described 
in Proposition 2.3. Then the distribution of~K may be described as follows. 


(i) The distribution of X|f7„ = u, p is such that the unique values Yi,..., are indepen¬ 

dent with respective distributions given by IB- In the homogeneous case, it follows that 
Yi,..., Y),(p)|p are iid with common distribution rf(dx)/ri{I!P) = H(dx) not depending on U n . 

(ii) The distribution of p\U n = u, P(p = {C\,... , C' n ( p )}|f7„ = u ), is a conditional finite EPPF 
given by 


( 10 ) 


, , , e-^Mng’sW 

P(e 1 ,...,e n(p) | u)- j™ tne _ ut Mt)dt ■ 


where E[Tn( p )\U n = u) = e ^( u ' ) / 0 °° t n e ut fT(t)dt That is, conditional on U n = u, p is a finite 
Gibbs partition. 


(Hi) The marginal density of U n = F„/T is 


POO 

f Un (u) = [rwrV - 1 / t n e~ ut f T it)dt. 


0 
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Proof. Statement (i) follows by applying Bayes rule to ©■ Statement (iii) is straightforward. 
An application of Bayes rule also yields readily a description of the distribution of p| U n = u, where 
the normalizer is II K e } i u )- Here stands for the sum over all partitions of the set of 
integers {1,..., n}. The simpler form in 111 (II) may be obtained by noting some known relationships 
between cumulants, partitions and moments. However, for immediate clarity one can use © to 
establish the identity fu„(u) = [r(n)] _1 u" _1 e _ ^^ (u). The result then follows by 

noting the form of fu n {u) given in (iii). □ 

The next proposition offers another description of the distribution of the unique values. 

Proposition 3.1 Suppose that X has distribution ^. Then using the decomposition v{ds,dx) = 
Fo(dx\s)p(ds), it follows that the distribution of Y \,..., l^,( p ) given U n , Ji, n , ■ ■ ■, J n (p).m P is such 
that the Yj are independent with respective distributions P (Yj £ dy\J^ n ) = Fo(dy\Jj t „). That is, the 
conditional distributions only depend on U n through the 

Remark 5. The above results demonstrate the important role that U n plays in simplifying the 
above quantities. In effect, conditioning on U n , reveals conditional likelihoods that have exponential 
form. This exponential form bears resemblance to those appearing in the posterior analysis of neutral 
to the right processes and the Levy moving average models discussed in James (2005a). Models of 
this type are most amenable to the direct application of James (2005a, Proposition 2.3). It is then 
not surprising that one may notice some similarities between our posterior characterizations and 
those described by Doskum (1974), Ferguson (1974), Hjort(1990) and Kim (1999) for NTR models. 
We point out however that these models are otherwise very different, see James (2005a,2005c). 

Remark 6. We note further that conditioning on T, instead of U n , does not lead to simplified 
expressions. See however Pitman (2003) for important interpretations of conditioning on T in the 
stable case. 

We close this section by showing how U n is related to what is called the real inversion formula for 
the Laplace transform as described in Bondesson (1992, eq. (6.2.1), p. 92)[see Feller (1971,VII)]. 
In some sense it shows that U n is asymptotically sufficient for T. The result below follows directly 
from Feller (1971, VII). 

Proposition 3.2 Let Y n = n/U n = nT/T n . Then the pdf ofY n is given by 

-i / \ Tl-\-l /»00 / 1 \Tl / \ 71+1 

(11) /w(2/) = fr — tt(-) / e~ nt /yt n f T (t)dt= [ — M-) ^\n/y), 

T(n+l)\y) J o n\ \y J 

where <+) denotes the n-th derivative of <f>. Additionally it follows that fy n (t) converges to frit) 
uniformly in every finite interval, as n —> oo. Hence m is an inversion formula for the Laplace 
transform ofT 

Remark 7. Comparing m with 0 shows that conditional on U n> the distribution of p, is a 
finite EPPF of Gibbs form. In particular for fixed u, we see that in this case, 

POO 

v nMp ) = E[t: (v) \Uu =u]= e*<“> / t n e~ ut f T (t)dt , 

Jo 

which importantly does not depend on n(p), and w ej = K ej „{u). This has many interesting conse¬ 
quences, of which we shall highlight a few in the forthcoming sections. 
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3.1 Distributional results and moment formulae for complex functionals 

The next result gives an expression for the distribution of n(p) given U n . 

Proposition 3.3 Let n\ > ri 2 • • • > Ufc > 1 denote an ordering (composition) of the corresponding 
(ei,..., e*;). Recall that n( p) represents the number of unique values of ( Xi ,..., X n ). Then under 
the distribution , the conditional distribution of n( p) given U n = u, is given by 





for k = 1, ..., n. Where the sum corresponds to the sum over all compositions (n \,..., nk) of n of 
size n(p) = k. 

The fact that p|J7„ is a finite Gibbs partition allows us to apply Pitman (2002, eq. 98) [see also 
Kolchin (1986)] to immediately deduce the following generalization of the Ewens sampling formula, 
Ewens (1972), and equivalently Antoniak (1974, Proposition 3). 

Proposition 3.4 Define a random vector (|n ?i „|, /orl < i < n) of non-negative counts by |IIj jn | = 
Y^j-i = i) for i = 1,... ,n. Where specifically, |IIj in | denotes the number of cells of size i. 

The distribution of (|IIj i71 |, for 1 < i < n) given U n = u, can be represented as 



( 12 ) 


where m j = k an d Sy=i j m j = n ■ Equivalently 112H is the conditional distribution, given U n , 

of the number of values of (Xi,..., X n ) appearing 1 time, 2 times etc, corresponding to the numbers 
(mi,..., m n ), when X has distribution ^. 

The recognition that ^ is the n — th moment measure of P allows one to obtain easily otherwise 
complex expressions for moments of functionals of P. The discussion in Ishwaran and James (2003a, 
section 3.2) combined with Proposition 2.1 leads to the following formula. Recall that ]G p denotes 
the sum over all partitions of the set {1,..., n}. 

Proposition 3.5 Suppose that g±,... ,g n are real-valued functions on X and define the functionals 
P(gf) = f x gi(x)P{dx ) for i = 1,... n. Assume that for each n, E [J[[" =i (P(gi))\ < +oo- 

(i) Then, E [fllLi P(gi)\ equal to, 



(13) 


(ii) For integers n\,...,n q chosen such that n i = n f or an ’integer q < n, it follows that, 


E[UU ( P{gi)) ni ] is equal to 


(14) 



Where ejj denotes the number of indices in Cj associated with gi . 
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Remark 8. It is interesting to note that all our results conditioned on U n , contain the known 
unconditional results for the Dirichlet process. This is because the Dirichlet process is independent 
of U n . To see this, note that the Dirichlet process with total mass 9 , corresponds to the choice 
of p(ds) = 9s~ 1 e~ s ds. It follows that for each j that Kj(u) = 0(1 + zt) _J T(j) and E[T n ^\u] = 
(1 + M) -rl [r(0)/r(0 + n)]. Aditionally the fu n { u Ip) := fu n (u) oc w" _1 (l + u)~^ n+e \ That is 
U n = T n /T is a gamma-gamma random variable independent of X. Equivalently 1/(1 + U n ) is a 
Beta(0,n) random variable. Hence, (I12I1 . specializes to 


F (l n t,nl = mj, 1 <j< n\u) = 


n! 

n'L^+z-i) 



i 

nrijV 


This equates to the Ewens sampling formula derived by Ewens (1972), which is equivalent to the 
result in Antoniak (1974, Proposition 3) describing the number of values of (A 1; ..., X n ) appearing 1 
time, 2 times etc, corresponding to the numbers (m i,..., m n ). Additionally, note that (1101 becomes, 


(15) 


PD( P |0) 


e n(p) U n(p) iej _ iy 

nr=i(e+i-i) 


which is the variant of Ewens sampling formula, often called the Chinese restaurant process. [See 
Pitman (2002, p. 60) and Ishwaran and James (2003a)]. The calculations for the Dirichlet process 
involving U n may be found in James (2005b), where it is shown that U n and its variants still play a 
significant role. 


4 Mixture models 

In terms of statistical applications, owing to the success of the Dirichlet process, one of the most 
fruitful ways for exploiting NRM’s is their potential use as basic building blocks in hierarchical 
mixture models. In this setting, A'i,..., X n are missing values which capture the clustering structure 
within the data. This class of models was first introduced, for the Dirichlet process, by Lo (1984) and 
later popularized by the development of suitable MCMC techniques in Escobar and West (1995). See 
Dey, Muller and Sinha (1998) and Ishwaran and James (2001, 2003a,b) for subsequent developments. 
Recently, mixtures of Dirichlet process have been generalized to mixtures of stick-breaking priors 
in Ishwaran and James (2001, 2003b) and random measures driven by increasing additive processes 
in Nieto-Barajas, Priinster and Walker (2004). A recent example of application of this class is 
provided in Lijoi, Mena and Priinster (2004) where the clustering behaviour is modeled according to 
a normalized inverse Gaussian process. Ishwaran and James (2003a) also introduce a general class of 
species sampling mixture models and describe various algorithms for efficient implementation. See 
also Hoff (2003, section 4) for an interesting use of the Dirichlet process mixture model framework. 
Those ideas naturally extend to models based on the NRM’s. 

We first recall the model as set up by Lo (1984). Suppose {/(• \x) : x £ SP} is a family of 
non-negative kernels defined on a Polish space W such that / w f(w\x) A(d w) = 1 for any x in SC 
and for some a -finite measure A. Next, let W = (Wi,..., W n ) be a vector of W-valued random 
elements such that, given Xi ,..., X n from a NRMI P , they are independent and Wj admits density, 
with respect to A, /(• | Xj). This is the same as supposing that W \,..., W n are exchangeable draws 
from the random density /(•) = f x /(• |a;) P(da’). One is naturally interested in the determination 
of the distribution of the posterior density /, given the observations W. However, one gains more 
flexibility in working directly with the posterior distribution of P or p given W. That is, /, is then 
seen as one of many possibly interesting functionals of P. Moreover, under certain identifiability 
assumptions, the estimation of the mixing distribution P is of primary concern. 
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Notice that the above description shows that the joint distribution of (W, X, P, U n ) can be 
written as, 


n/cw) 

_i=1 


1P(dP\X)JZ(dX\u)f Un (u), 


where P(dP|X) denotes the posterior distribution of P described by Theorem 2.1 or Proposition 
2.2. One could then apply arguments similar to those exploited in Lo (1984) and in Ishwaran 
and James (2003a) to yield analogous characterizations of the posterior distribution. We shall not 
present those here. In the next section we will describe a general Monte-Carlo which can be used to 
sample from the posterior distribution. For a better understanding of this connection we note that 
an application of Proposition 2.4 shows that the marginal distribution of W is given by 


roc 

Jo 

n (p) „ 

5>»(pi«) n / 

n /(W 

H j,n{dy\u) 


7 = 1 J * 

_iec 3 



fu n (u) d u. 


This is a special case of <HS, with gi (y) = f(Wi\y). 


Remark 9. It is important to note that in mixture models both X and U n are viewed as missing 
values. Hence it is quite natural to work with the distribution of X\U n as a prior and subsequently 
X|t/ n ,W 


5 Sampling from and related functionals 

As mentioned earlier, obtaining a tractable form of the marginal distribution is crucial to both 
practical implementation and theoretical understanding of these models. In particular, understand¬ 
ing how to sample Xi, ..., X n from ./d is important for applications involving mixture models. We 
discuss briefly some ideas on how this may be done. One aspect of our expressions is the appearance 
of the cumulants ki(u) and the corresponding moments, 

m n (u) = E[T™ {p) \U n = u\. 

In many cases either the cumulants are easy to calculate or the moments are. Moreover, one can 
use the following result of Theile in order to recover one from the other. That is, for any integer n 

n— 1 

(16) n n (u) = m n (u) - ^2 

i=i 

Many mathematical packages can easily deal with iHTil) . Similar to the case of the Dirichlet process, 
it is noted that many complex expressions can be approximated by obtaining draws from . Using 
Proposition 2 and Corollary 1 a draw from ./$, may be conducted as follows. First one draws 
U n = r n/T, either directly or by drawing from the independent pair (r„,T) according to the 
gamma density of T n and the density fr- Given U n = u, one draws p from 7r n (p|w). Given p 
and U n = u, one finally draws X which amounts to independently sampling Yj from • |w)), for 
j = 1,, n(p). Since the normalizing constant, m n (u), in 7r(p|it) is fairly simple, one may often be 
able to devise a simple scheme to draw p, given U n = u, exactly. If this is not the case, one can use a 
simple variation of a weighted Chinese restaurant (WCR) process [see Lo, Brunner, and Chan (1996) 
and Ishwaran and James (2003a)], which can be deduced from James (2002, Lemma 2.3). 


71—1 

l-l 


Ki(u)m n -i(u). 
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5.1 Generalized Chinese Restaurant and Polya Urn procedures 


Here, we use the fact that these models are structurally similar to those discussed in section 4 of 
James (2005a). It follows that one can use section 4.4. of James (2005a) to deduce general extensions 
of Polya Urn Gibbs samplers and SIS procedures given by Escobar(1994), Liu (1996), and the Gibbs 
sampling/SIS procedures based on a generalized Chinese restaurant process as mentioned above. As 
such, we will only sketch out the relevant probabilities and refer the reader to James (2005a), and 
references therein, for additional mechanics of the implementation. 

For greatest flexibility we will give the relevant probabilities needed to sample approximately 
from models related to a joint density proportional to 


y=i 


JK{dX\u) 


which is deducible from Proposition 3.5. We note that ^#(dX|u) has an urn type representa¬ 
tion which can be deduced from an application of James (2005a, Proposition 5.1). Similar to 
James (2005a, equation 40), define for r = 0,..., n — 1 conditional probabilities, 


P(Xr+i € dx\X r ,u) 


— X r (dy\u) 

Cr 


l(Pr) 7 (V \ 
c r 

3 =1 


S Yj ( dx) 


where X r = {Xi,..., X r }, A r (dx\u) oc g r +i(x)Hi t i(dx\u), 

lo,r= g r +\(y) Hi,i(dy\u) and lj, r (y) = g r +i(y)Ti +ej r (u\y)/r ei r (u\y). 

J sc 

Additionally c r = ?o,r + hxWi)- Examining James (2005a, section 4.4.) we see these are 

the ingredients to implement general analogues of the Polya Urn Gibbs Sampler and SIS procedures 
described by Escobar (1994) and Liu (1996). To get the Chinese restaurant type procedures one 
samples partitions p based on probabilities derived from lo, r and 


lj,r = / h,r(y ) TT 9i{y)Hj,r{dy\u) for j = 1, . . . , 7l(p r ) 

Jx iec jir 

where p r denotes a partion of the integers {1,..., r} and each Cj r = {i < r : = Yj} denotes 

the corresponding cells. Additionally, l{r) = lo,r + ^2^=1^ hx- particular, applying the SIS WCR 
procedure described in James (2005a), now leads to a sampling p from a density g(p|u), which 
satisfies 

«(p) 

L(p\u)q(p\u) = 7r(p|u) hi n gi(y)H j:n (dy\u) 

i=1 Jsr i&Cj 

where L(p|u) = n"=i K r ~ 1 )/m n (u). This is justified by James (2002, Lemma 2.3). Note that 
setting gt(Xi) = f(Wi\Xi) leads to sampling procedures for mixture models. Setting gi(x) = 1 leads 
to sampling from .M. In particular this algorithm includes the classical Chinese restaurant process 
for the Dirichlet process. 


6 Illustrative Examples 

Here we study two examples which are connected to the Dirichlet process, but require a more delicate 
analysis. In section 7, we address a more involved class of models. Hereafter, we let 38{a, b), denote 
the fact that a random variable has a Beta distribution with parameters (a, b). Let 38{x\a, b) denote 
its density. Similarly Sf( a) denotes the law of a gamma random variable with shape a and scale 1. 
G 0 denotes the corresponding gamma random variable having density fG a ( x ) =&(x\a). 
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6.1 Classes of Dependent Dirichlet processes 

Here we present a large class of models which share the same EPPF as the Dirichlet process but 
are otherwise substantially more complex. This class is seen to add more flexibility to the Dirichlet 
process, and may be of particular interest in mixture models. One can also see that a study of a 
subclass of such models is also related to the exposition in Aldous and Pitman (2002). 

That is we build NRM based on 


(17) 


vo,e(ds,dx) = Fo(dx\s)6s x e s 


It is evident that T = Gg. A careful examination of ra shows that a class of dependent Dirichlet 
processes may be described in terms of a stick-breaking representation, 

oo i— 1 

(18) Pe(-) =^QiS Zi (-) with Qi=W i H(l-W j ). 

i=l 1=1 

The Wi are independent Beta(l, 0), corresponding to the usual representation of Sethuraman (1994), 
but now the ( Zi) are no longer independent of the (Wi). A technical point is that Qi = Wi JI*~*(1 — 
Wj ) are the points ranked by size-biased sampling and the corresponding Zi has conditional distri¬ 
bution depending on TQi = Ji, i.e. P (Zi £ dz\ Ji, J2, .. .) = Fo(dz\ Ji). Note importantly that the 
distribution of Ji is much more manageable than the distribution of the ranked points of a gamma 
process. 

We can describe the posterior distribution in the following way. First, notice that 

(19) v u (ds,dx) = e~ ua Fo(dx\s)6s~ 1 e~ s , 

which implies that (l + U n )T n ( p ) = Gg and (1 + U n ) pL n ( p ^(dx) = Gg Qi^z itn - Here (Qi) have the 
same distribution as described above in m, they are independent of X and U n , while (Zi t „) are now 
random variables depending on U n . Specifically, conditioned on the sequence (U n , GgQ 1, GgQ 2,.. •), 
the (Zi tn ) are independent with distributions 

(20) P (Z iin £ dx\U n ,GgQ 1 ,GgQ 2 ,...) = F 0 (dx\GgQ z /(l + U n )) 

where Gg, U n and (Qi) are independent. Now setting (1 +U n )Jj, n = Gj, n and one has the conditional 
distributions of Gj. n \U n , X and Gj iU \U n ,p as, 

(21) P (Gp n £ ds\Yj,u) oc F 0 (dYj\s/(l + u))^(s\ej tn )ds 
and 

P (Gj^ n £ ds\u) = Sf(s|e Ji71 )<is. 

Additionally, set T n = (1 + U n )T. That is T n = Gg + G,. n . The conditional density of 

V n := 1/(1 + U n ), given X is specified by 


( 22 ) 


™(P) roc 

f Vn (v\X) oc p e_1 (l - v) 71 ^ 1 J^[ / F 0 (dYj\sv)^(s\e j} n)ds. 

l=i Jo 


V n |p is a Jd(6,n) random variable, independent of p. We now summarize some facts about this 
process in the next result. 


Theorem 6.1 Suppose that Pg denotes a class of dependent Dirichlet processes defined as in UB 
via the intensity 0- Note that V n '■= 1/(1 + U n ) given X has distribution Then the following 

results hold 
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(i) The posterior distribution of Pg\V n , X is equivalent to the random probability measure 

„ OO n(p) 

n i=l J=1 

where the ( Qi ) are equivalent in distribution to those in \1H\) . and are independent of X. 
The sequence has distribution specified by 12(A) . Additionally Qj, n = Gj, n /T n , where the 

distribution of Gj^ n ,T n \V n ,'X. is specified by IUA) and m. 

(ii) The distributions of ((Gj in ),T n , (Qj^)) |p, are the same as given V n and p, and equate to the 
classical results for the Dirichlet process posterior distribution. That is, given p, (Qi, n , ■ ■ ■, Qn( p), 
is a Dirichlet (ei,... , e n ^ p y, 9) vector. Equivalently, each Qj, n is A§(ej tTl , 9 + n — ej tTl ), since 
the ( Gj t n ) given p are independent ^{e^n, 1) and T n |p is &(9 + n). 

It is evident again that the prediction rule does not in general have a simple form. However, the 
next result yields a nice description of the marginal distribution of X. 

Proposition 6.1 Suppose that Pg is a dependent Dirichlet process as described above. Then the 
marginal distribution o/X|T4, = v, where V„ is AS(9,n), is given by 


™(P) nOO 

(23) PD(p\9) TT / F 0 {d yj \sv)y{s\ ej , n )ds 

i= i Jo 

The expression in shows that the conditional distribution of Yi,... ,Y n ( p )\V n = v,p are inde¬ 
pendent with respective distributions 

f‘QO 

P(Yj £ dy\v) = / F 0 (dyj\sv)&(s\ej, n )ds. 

Jo 

Additionally, Y\,... ,y„( p )|(Gi in ), V n = v,p are independent with distributions Fo(dyj\Gj, n v). 

Notice that in every case the distribution of (Yj)\V n ,p are expressible via gamma mixing mea¬ 
sures. The next result demonstrates a particularly simple case. 

Proposition 6.2 Suppose that for a > 0,<5 > 0, Fo(dy\a) oc a 5 y 5 ~ 1 e~ ay dy corresponds to a gamma 
random variable. Then Y \,..., T^( p ) \V n , p are independent with each Yj = Rj/V n , where Rj is 
independent ofV n and the distribution of Rj |p is 

fRj( r \p) oc r <5_1 ( 1 + r ')-( 5+ej ’ n '> for 0 < r < oo 

Or equivalently 1/(1 + Rj) given p is A3(ej in ,5). Additionally the Zi <n appearing in Proposition 6.1 

satisfy Zi yTl = Ri/(QiV n ) where 1/(1 + Rf) is independent of(Qi), V n and has a Ad(l,6) distribution. 
The distribution of the Zj is given by setting Vq := 1 

Remark 10. Note that unlike the case of the usual Dirichlet process V n is not independent of 
X, however its distribution is independent of p and the marginal distribution of V n is the same in 
both the dependent and classical case. Proposition 6.2 shows that it is easy to sample from . Since 
V n , p are independent, one can draw from PD(p\9) according to the classical Chinese restaurant, 
then sample independently V n from a A§(8, n ). It then remains to draw the Yi,..., Y n ^ conditional 
on V n ,p. Or as an intermediate step, conditional on ( Gj tn ),V n = i\p. 
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6.2 NRMs derived from Beta processes 

Suppose that p(ds|cc) = s -1 (l — s) c ^ x '~ 1 dsc(x), for 0 < s < 1. The corresponding process p is a 
beta process. Exploiting beta processes as an ingredient for constructing NRM models leads to a 
different posterior behaviour from the one analyzed in Hjort (1990). The Levy measure associated 
with p u in m is 


i/„(ds, da;) = e~ us a _1 (l - s^^ds rj(dx) for s e (0,1] 

and, hence, p u is not a beta process. Additionally, the distribution of the jumps Jj, n , given ([/„ , X) 
is 


P (Jj,n € ds\Yj,u) = 


ej.n-i (i _ s y(yj)-i ds 


i (cy ri , c(y.y) -\- U ) 

where i-Fj is the confluent hypergeometric function. It follows that one can write m(dxi\x \,..., Xi- 1 , u) 


as, 


iEi(l, c(xi) + 1; - u) rj{dxi)+ 

n(pi-i) 


E 


•j,i —1 1 ~t~ 1, c(yj) + Cj,i —1 + 1, w) 


t c (l/j) 1-Pl( e j,i—1, c {Vj) T —1 i u ) 
Finally, the marginal distribution of X, given {7 n , can be represented as 


5 Vj {dxi). 


[m n {u )] 


n(p) 

n 


T(e jtn )T(c( yj ) + l) 


fJi r ( e J,n+c(%)) 
where m n (u) can be expressed using C3, where ki(m) = m\(u). 


n(p) 

n i-Fi(ej>, c(yj ) + ej-; -u)r?(dj/j). 
l=i 


Remark 11. We point that by setting c(s) := 1 and letting r/(dx) = 6H(dx), yields models 
defined by the scale invariant Poisson process, which is of importance in a variety of applications. 
In particular it is known that T has the important Dickman distribution. Moreover, in this case, 
the class of NRM have been discussed previously. Specifically, Arratia, Barbour and Tavare (1999, 
2004) show that the distribution of P|T < 1 is in fact a Dirichlet process with shape 9. See Arratia, 
Barbour and Tavare (2004) for further implications and details. 


7 Generalized Gamma Convolution processes: NRM related 
to Dirichlet mean functionals 

This last example demonstrates the great flexibility of the h -biased framework, which allows us to 
describe a large class of NRM in terms of the tractable ( Qi ) of the Dirichlet process and related 
variables. These models will be based on random variables T which have distributions that are Gen¬ 
eralized Gamma Convolutions (GGC). This large class of self-decomposable infinite-divisible random 
variables was introduced by Thorin (1977, 1978), further developments are given in Bondesson (1979, 
1992). In particular, a subclass of such models will be seen to be connected to the study of mean 
functionals of a Dirichlet process, initiated by Cifarelli and Regazzini (1990). We believe that our 
discussion will also shed some new light on these two lines of research which are essentially duals of 
one another. Moreover, our approach shows that one may extend the study of mean functionals of 
Dirichlet processes to this larger and more flexible setting. Quite strikingly, this class of NRM is very 
rich, including the stable law processes of index 0 < a < 1, (and hence by a change of variable the 
entire two parameter Poisson Dirichlet class discussed in Pitman (1996)), classes of models based 
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on the Pareto and log-Normal distribution, and the class of Generalized Inverse Gaussian (GIG) 
models, among many others. 

Let N denote a Poisson random measure on the space (0, oo) x (0, oo) x 5V, with mean intensity 


(24) 


v{dr,dv,dx) = Or : e r (dv)H(dx) 


where ^ is a non-negative non-decreasing function on satisfying, (0) = 0, 

1 P OO 

| ln(u)|^(du) < oo and J l/y7f{dy) < oo. 

Note importantly, that it is possible for ^(oo) = oo. In particular, the condition 12511 is true if and 
only if 

pOO pOO pO O 

(26) ^>(A) = 0 / / (1 — e~ x< ' r ^ v ^)r~ 1 e~ r 75{dv) = 0 / ln(l + \/v)7f{dv) < oo. 

Jo Jo Jo 

This allows one to define random variables T = GgMg = / 0 °° f^° (r/v)N(dr,dv), where Gg is inde¬ 
pendent of Mg, and has Levy exponent i/>(A). Let 5? denote the family of all random variables, T, 
such that the Levy exponent of T satisfies iGEJ). That is 57 is the class of random variables with 
distributions which are generalized gamma convolutions. This may be extended by random variables 
T + a for a > 0. The quantity uniquely determines the distribution of T, and we refer to as 
the Thorin measure. Additionally, 



°° p OO P OO 

fJ-jdx) = Gg y ^(Qi/Vi)S Zi (dx) = / / (r/v)N(dr,dv,dx) 

i=l Jo Ja 


and Mg = (Qi/Vi)- O ma y be referred to as a GGC random measure. Then call Pm 9 a GGC 
NRM if it has a representation as an /i-biased random probabilty measure, here s := (r, v), with 
h(r,v ) = r/v, given by 


(27) 


Pm s {dx) 


J2jLl(Qi/ V i) S Zi(dx) 

Mg 


f 0 X fo° 0 ( r / V ) N ( d P dv . dx ) 

T 


where (Qi)i has the same marginal distribution as in 11811 . but is now independent of the sequence 
(Zi)i of i.i.d. random variables whose distribution is H. Additionally, both sequences are indepen¬ 
dent of ( Vi)i. Note that the distribution of the sequence is derived from the points of a Poisson 
random measure with mean intensity . In the special case where is a probability measure 
(finite measure), the (V))i are i.i.d. . However this not always true. In fact the obtainment of 
many interesting classes, such as the stable law, require that ^ is not a finite measure. 


Remark 12. We mention that if is a finite measure then Mg = J 0 °° 1 /yDg{dy), Dg being a 
Dirichlet process with shape parameter 67'. That is Mg corresponds to a class of (positive) Dirichlet 
mean functionals. That is, a subset of the Dirichlet process functionals whose study was initiated by 
Regazzini and Cifarelli (1990). However Mg constitutes a wider class of positive random variables 
as the representation YJhLi Qi^V it Zi does not in general correspond to a Dirichlet process unless the 
Vi’s are iid. Rather Gg Qi^Vi,Zi is a gamma process with possibly sigma-finite shape measure 
OT'H. As we shall see, this generality of Mg allows us much greater flexibility as there are many 
cases where the distribution of Mg and T can be deduced. 


Remark 13. Note interestingly, by independence of Gg and Mg, the fact that Gg is gamma 
distributed yields, 

POO 

E[e~ uT ] = E[e~ uG< > Ms } = (1 + uv)~ e f Me {v)dv = e~ 6 Io°° Wi+u/y)*(d v ) 

Jo 
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where /m 9 denotes the density of Mg. This is essentially the identity established in Cifarelli and 
Regazzini (1990) for Dirichlet mean functionals. Note that the choice of uniquely determines 
the distribution of Mg. Hence when ^(oo) = oo, such distributions are not captured by the 
current literature on Dirichlet mean functionals. Note additionally that the theory of GGC has been 
extended by Thorin (1978) [see Bondesson (1992)] to include distributions on the entire real line. 
See also Lijoi and Regazzini (2004). Here of course we require that T is positive. 

7.1 Posterior Distribution of GGC NRM 

Now to establish the posterior distribution, first note that /J n ( p ) | U n = it, X has mean intensity 

v u {dr,dv,dx) = 9r~ 1 e~ r( ' 1+u ^ v ^(dv)H(dx) 

We recognize, from James (2005a, Proposition 2.1), that the change in the intensity from v to v u is 
due to exponential tilting by e~ uT . It is useful to see explicitly how this operation affects the Thorin 
measure, , and indeed how this affects the resulting distribution of Mg. 

Proposition 7.1 Let T £ ST defined by the Thorin measure ST(dv) := u(v)dv. Then suppose that 
Tb is the random variable with density oc e~ bt frit) for some b > 0. Then it follows that Tg £ ST, with 
Thorin measure u(z — b)I{z > b}dz. This follows from, the fact that its Levy exponent is expressible 
as 

pOO pOO 

tfbi A) = / ln(l + A / z)u>(z — b)dz — / ln(l + X/(y + b))u>(y)dy. 

Jb Jo 

Equivalently, Tg = GgMgj = J^° J^°(r/y)Nb(dr,dy) f where Nb is a Poisson random measure with 
E [Nb(dr,dz,dx)] = 9r~ x e~ r co(z — b)I{z > b}dzH{x). Mg^ := QiKVi + b) the mean func¬ 
tional whose law is induced by the tilting operation 


Proof. This follows from a straightforward change of variables or using the fact that ipbW = 
i/j(\ + b) — ip(b). □ 


Now examining Proposition 7.1, with b = U n , it follows that, 

OO 

Mn(p) = Gg 'y ' QiiVi + Un) Szt ■ 
i= 1 


Additionally T n ^ p ) 
given by, 


= GgM e , Un 


Setting Jj n := (Aj tn ,Vj :n ), its joint distribution given t/ n ,X is 


P(A J>n £ dr,Vj tn £ dv\u) 


g-rfl+u/eJj.ej-l v - e j9ty( dv ) dr 

K ej (u) 


where K ej (u) = 6T(ey„ ) / 0 °°(n + u) ej JT(dv). This shows that the distribution of each Aj tn \Vj :n , U n 
is equivalent in distribution to Gy n (l + U n /Vj^ n ) , where Gy„ denotes a gamma random variable 
with shape ej and scale 1 independent of (U n ,Vj yn ). The distribution of fj,„|f7 n ,X is given by the 
density 


(28) 


P {Vj t n £ dv\u) 


(v + u) ej 9r(ej)Jtf(dv) 

K ej ( u ) 


(v + u) e] °JT (dv) 
f 0 °° {y + u)~ ei W(dy) 


Additionally, we use the fact that the prior and posterior distribution of 


oo «(p) 

G 0 ^4(Pi + Pn) _1 + E Gj,n{Vj, n T U n ) 

2=1 3 = 1 



U n T = U n GgMg = U n 
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where T n is independent of N and X. Furthermore notice that, 

r (0 + n) / 0 °° (1 + uv)~ < ' 8+n) v n f Me {dv) 


(29) 


.W = C( P) M = 


r ( 0 ) / 0 °°(l+uu) 8 f Me {dv) 

These facts lead to a non-obvious description of the posterior distribution given U n , X. 

Theorem 7.1 Suppose that Pm q , is the NRM defined by Then the following results hold 

(i) The posterior distribution of PM g \U n ,K is equivalent to the distribution of the random proba¬ 
bility measure 


(30) 


Ur 


Gg+n 


G e 

Gg+r 


J2Q*( V i+ U n) 

1=1 


-1 


Ep) 


$Zi + E Qj.niYjg 


3=1 


+ U n ) Sy, 


where, the gamma random variable Gg+ n = Gg + YYj=\ Gj <n , and independent ofU n ,T n , (Zf), 
the ( Qi ) are equivalent in distribution to those in HSU , and are independent of IK. Addition¬ 
ally, the vector (Qi >n , ..., Q n (p),n) is independent of (U n , T n , (Zf), Gg, Gg+ n ) and given p, is a 
Dirichlet (ei,..., e n ( p ); 9) vector. The relevant distributions ofVj tn ,U n are specified by 12M) 
and Vi A) . 

(ii) Equivalently, one may write as 


TT 'V °° 

^P^J2Qi,nS Zi 

-L n , 

i=l 


i - 


TT T 

n .-L i 


n(p) 


n(p) 

E Qlnfy 

3=1 


where 


Qi.r 


QriVr+UnY 1 
£,~1 QliVt + Un)- 1 


and Q* n 


Qj,n(Vj,n + U n ) 

YY=1 Ql,n(Vl,n + U n )- V 


7.2 ^ and some connections to the Bondesson Class 

Before saying more about the distribution of U n ,K, we next describe an important subclass of 
GGC random variables which interestingly are connected to the distribution of U n . As in Kent and 
Tyler (2001, p.257) let S denote a random variable on (0, oo) with density 

m 

(31) f s (s) = Cs /3_1 JJ (1 + Cjs)~ lj for s > 0, 

3=1 

where C is a normalizing constant, m > 1 , (3 > 0, Cj > 0,7 j > 0. Then the class containing the 
densities ED, together with their weak limits, constitutes the Bondesson SB sub-class of GGC 
models. We write S & AS if the density of S has the form in ED- Note that in the non-limiting 
case, the corresponding ^(oo) = j3 < oo. That is, in this case, the (Vi) are iid with distribution 
. The 28 class contains the Stable distributions of index a = 1/k for k = 2,3.... The 
gamma distribution, Pareto, Log-Normal random variables, generalized inverse gaussian, among 
many others. [See Bondesson(1992, Chapter 5.6)]. This class is known to be hyperbolically completely 
monotone and hence self-decomposable. See Bondesson (1992), Steutel and Van Harn (2004, Chapter 
5) and Kent and Tyler (2001, p.257) for further details. 

Proposition 7.2 Let U n = T n /T = T n / (GgMg), then its distribution is described as follows. 
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(i) The density ofU n |p is, 


(32) 


fu n (u\p) OC u n 1 



(1 +uv) e j Me (dv) 


n (P) /*oo 

n / (v + u)~ ei a lT (dv) 
1=1 J ° 


(ii) The marginal density of U n is 


,n—l 


E (e- uG ° M °(GeM e ) n ) = f°° (1 + uv)~ {e+n) v n f Me (dv) 

r(n)r(6>) j o 


< ; r(n) l ~ “ r(n)mJ 0 

This implies that that for any integrable function g, 


E(g(U n )} = / E 


f E K 


yM e J 


3S(dy\e , 


Proposition 7.3 Suppose that T £ ST, then U n := T n /T = T n / (GgMg) has the following proper¬ 
ties. 

(i) IfT := GgMg £ £8, then U n £ £$ However, T £ ST does not imply that U n £ ST. 

(ii) The distribution of U n \Mg = v is in 8$. with density of the form in \H1\) . with parameters, 
(3 = n, m = 1, Ci = v, 71 = 9 + n. Equivalently, MgU n is a gamma-gamma density, and hence 
in S3. The form of the density coincides with U n \Mg = 1. That is c 1 = 1 

(in) The distribution of U n \Mg = v, (Vj, n = Vj), p is in ST, with (3 = n, m = n(p) + 1 ,Cj = 1 /vj, 
7 j = ej, for j = 1,..., n(p) and c n ( p ) + 1 = v, 7 „( P )+i = 9. Specifically the conditional density 
is given by 

n(p) 

fu„(u\v , (Vj)) = C'u" _1 (l + uv)~ e (1 +u/vj)~ ej 

1=1 

(iv) Let Y n = n/U n . Then Y n £ ST and hence this family of densities is dense in the class of all 
ST. 


Proof. Statement (i) is an immediate consequence of Bondesson (1992, Theorem 6.2.1, p.92.). 
See also Kent and Tyler (2001, statement 7, p. 257). Statement (ii) and (iii) follow from an 
augmentation and matching with (13111 . Statement(iv) is read from Bondesson (1992, p. 92).□ 


The next result gives a form of the EPPF. 

Proposition 7.4 Suppose that p denotes the partition derived from Pm 6 ■ Then the following results 
hold. 


(i) The conditional distribution of p\U n is given by 


(34) 


PD(p\9) 


/ 0 °°(l + w) 8 f Me (dv) 
f 0 °° (! + uv)~ (d+n ^v n f Mg (dv) 



(:y + u ) eja fT(dy) 


(ii) The EPPF may be expressed as, 


(35) 


PD(p\9) 


T(9 + n) 

W) 




(1 + uv) e f Me (dv) 


X 



(y + u) ej 3Y (dy) 


du 
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Remark 14. The result in itTol) combined with d and the form of Kj leads to interesting 
relationships between and Mg. In particular, using the property that (u) = rrii(u) leads to the 
following identity, 



(v + u) l W(dv) 


fo°° (1 + uv) (e+1) vfMe (dv) 
fo°° (1 +uv)~ e f M g(dv) 


Remark 15. The unified representation of Pm 9 and the characterization of its posterior dis¬ 
tribution given in d in terms of the Dirichlet process (Qi) has many interesting implications. For 
instance, it suggests that one can use a variant of the Blocked Gibbs algorithms in Ishwaran and 
James (2001, 2003b) to approximately sample realizations of Pm 6 and its posterior process for many 
different classes of models which are not Dirichlet processes. 


7.3 Some specific examples of GGC NRM 

In this section we now highlight some important specific cases. First it is interesting to note the the 
study of the GGC is primarily about establishing the fact that T £ J7, and possibly identifying . 
In contrast, the study of Dirichlet process mean functionals involves identification of the distribution 
of Mg, when ^ is a pre-specified finite measure. We see these two approaches as complementary 
to one another. We point out that explicit forms for the ^ are not known in every case. However, 
importantly there are many examples of T which are known to be in &. For instance, an explicit 
form of % is not known for the Pareto distribution. However, as we have shown, many interesting 
applications involving sampling from and mixture models can still be conducted if one knows 
m n (u) or the cumulants K n (u). Similarly, from the Dirichlet process literature the explicit law of 
Mg is not known in many cases, however we can choose to be from a vast selection of proba¬ 
bility distributions. Thus as stated earlier, in that case, we have an explicit description of the iid 
distribution of the sequence (Vi). 

As some general examples, one could choose T = S £ £$, as defined in <ED- Note also that 
T = S q £ S? for |g| > 1 [see Bondesson (1979, Corollary 1)]. Here we give some precise examples 
from the literature. 

Remark 16. Note carefully that we only need to show that T has a particular law to establish 
the law of the NRM Pm 6 '■= /x/T. This is due to the fact that the Laplace functional of p, evaluated 
at some bounded measurable functional g is determined by 

— logE[e -A1 ^] = f ip(g(x))H(dx) 

Jx 

for ip(g(x)) given by replacing A with g(x) in 1261) 

7.3.1 Stable Case and related models 

As mentioned previously, the NRM based on the stable law have been extensively studied by Pit¬ 
man (1996, 2002) and Pitman and Yor (1997). This class has numerous applications and also has 
a tractable EPPF. The explicit posterior distribution of this class was obtained by Pitman (1996) 
by exploiting its explicit stick-breaking representation. James (2002, section 5.3) gives an alterna¬ 
tive derivation working directly with the Levy measure of a stable law. Here, we show that Pm 9 
offers another representation of the stable law NRM and hence alternative approach to its analysis. 
We give some details of its posterior analysis which are inherent to its representation in terms of 
a GGC NRM. Further details can be deduced easily from the specific analysis of these models in 
James (2002, sections 5.3 and 5.4). 

From Bondesson (1992, p. 35), it is now easy to see that T is stable, and hence Pm 9 is a stable 
NRM, if 

W a (dv) = —— a v a ~ 1 dv for v > 0. 

1 (1 — a)l (a) 
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As checks, one can make the change of variable z = r/v, in © and integrate with respect to W first. 
That is for all 9 > 0, ip(X) = C a: gX a for some constant C a fi. Note interestingly that, conditional on 
U n , we have 

(36) (jlf^ + a, a) 

and, hence, 

G„n + l) - - a) 

for j = l,...,n(p). That is, these quantities are independent of U n ■ Additionally, K ej (u ) = 
9au a ~ ej T(ej — a)/T( 1 — a). Ignoring scale parameters it follows that the distribution of L n := t/“|p 
is f#(n(p)). One then establishes that l-i n ( p )\U n ,p is a generalized gamma process, whose Thorin 

measure is given by J33) below, with b = U n . Moreover the distribution of In /i n ( P ) \L n , p is 
determined by the Levy measure 

(37) , Ln °\r< ^ r ^~ le ~ T ’( ^, - 1 > 1 }drdvH(dx). 

1(1 — a) 1 (a) 

In particular this implies that the distribution of Ll/ a T n ( p )\p is Sf(n(p)a). These formula can be 
used in Proposition 7.1 and 7.3, to establish the known results about the posterior distribution. 
Further details can be deduced from James (2002, section 5). The two parameter Poisson-Dirichlet 
distribution with parameters (a,q) for 0 < a < 1 and q > —a arises from the weighted law 
cx T~ q P(dN\i') as described in Proposition 8.2. It remains to note that L Uiq := [/“ given p is 
@(n( p) + qa) 

Remark 17. Note also that, in the case where T is the stable law, the generalized Cauchy- 
Stieltjes transform of Mg is 

(1 + A v)~ e f Me {dv) = e~ Ca ’ eX °‘ 

which can be easily inverted. This, in some sense, easiest example is outside of the scope of the 
current theory of Dirichlet process mean functionals as ^ (oo) = oo. 



7.3.2 Generalized Gamma 


The class of generalized gamma process defined for 0 < a < 1, and b > 0 [see Brix (1999)] arises 
from the tilting by e~ bSa , where S a is stable law of index a. The simplest case is when a = 1/2, 
where the corresponding T has an Inverse Gaussian distribution. Proposition 7.1 shows that the 
Thorin measure is given, in this case, by 


(38) 


a 

r(l - a)r(a) 


{v 


b) a 1 dv 


for v > b. 


The Thorin measure of n n ^ p ) \U n , p is of the same form as (13811 with b replaced by U n + b, and hence 
is a generalized gamma process. It follows that setting L n = C a j(U n + b) a , the distribution of 
Ll/ a fj, n (p)\L n , P is the same as that for the stable law determined by iTTH) . Similar to (TI?)1) one has, 
conditional on U n , 


( Vj,n~b 

\ U n + b 



a , a) 


and 


Gj ?n 


f Vj,n - b 

\ U n + b 



&(ej - a) 


for j = 1,... ,n(p). With K ej {u) = 9a(u + b) a ei T(ej — a)/T( 1 — a). The density of U n |p is 

fuju Ip) oc {u + b) n{ - v)a ~ n u n - 1 e- G '*’° K“+ 6 )“- b “] 
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Using a Binomial expansion, the distribution of L n |p is given, for all b > 0, by 
, , ) EVq ( n ?)(-l) k w- k /“I{w > C a ,eW(w\n(p)) 

z:zUV)(-i) k nG-v a i{G n{p) >c a , e }\ 

The normalizing constant can be used to yield an explicit expression for the EPPF. 

Remark 18. Note that the for the range a = 0 and b > 0, the generalized gamma process 
equates with the the gamma process. 


7.3.3 Generalized Inverse Gaussian 

A more challenging class is the Generalized Inverse Gaussian (GIG) class of models. First set 6 = 1. 
Let A, v and 5 be such that 0 < A < oo, while v and S are non-negative and not simultaneously 0. 
As in Barndorff-Nielsen and Shephard (2001), T is GIG(A,5, v) if its density is of the form 

(39) f T (t\ A, 5, v) = 2K^5v) tX ~ l exp 't _ ^ 2 *~ 1 + v '^ 


where I\\ is a Bessel function of the third kind. When 5 = 0 and A > 0, v > 0, GIG(A, 0, v) equates 
with the gamma distribution. When A < 0, 5 > 0 and v = 0, then GIG(A, 6, 0) is a reciprocal, or 
inverse gamma distribution. Using the parametrization, A = —a, for a > 0, and b = 5 2 /2, yields the 
density of an inverse gamma distribution with parameters, a, b, with density 

(40) r(^A) exp { - ^ 2 ^ 1 )} = exp {- w_1 } 

A special case of this is when A = —1/2 leading to a stable law of index 1/2. The Inverse Gaussian 
distribution defined by setting A = —1/2, S > 0, and v > 0 that is a GIG(—1/2, 5, v). The hyperbolic 
distribution coincides with the case of A = 1. Now define, 



where J v and N v are Bessel functions of the first and second kind respectively. The expression GH> is 
central to a body of work on the infinite divisibility of student t-distribution and generalized gamma 
convolutions. One has for m = 0,1, 2 ..., 


(42) 


9m+ 1/2 (*^) 


2 a /" 1-1 )/ 2 

^n;= 1 (^+«a 2 ) 


where the ai,..., a m are the zeros of A m +i / 2 { z )- It is known that is given by 


(43) 


= I { x>v 2 / 2 } 



S 2 v 2 )dy + max(0, A) 


dx 


The simplest cases correspond to the gamma distribution, that is the Dirichlet process, and cases 
covered by m. Setting m = 0, v = 0 in iTTSll coincides with the ^ 1/2 of a stable (1/2) law. When 
v > 0, one obtains the Inverse-Gaussian distribution. Setting A = —3/2 gives m = 1, now with 
v > 0,5 > 0 corresponds yields to the Thorin measure given by 

^ (dy) = -^(2 y-v 2 + a 2 S~ 2 ) 1 1{y > v 2 /2} d y 

7 T z 
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The simplest case arises if one further sets v 2 = a\5 2 . The other cases involving m are a slightly 
more complex but certainly can be handled. For the general case, using gD one can calculate the K n 
from the moments m n which are obtained as ratios of Bessel functions K\. Making the substitution 
u = ( w 2 — v 2 )/2 for w > v > 0, gives the Laplace transform and the m n as follows, 


<K( v 2 - v 2 )/2) 


v x K\(6w) 
w x K\(Sv) 


and 


m n ((w 2 


v 2 )/2 )=6 n w~ n 


K n+ \(5w) 

K\(5w) 


The marginal density of L n = -y/(2 U n + v 2 ) is given by, 

f Lri (w) = v x 2~^ n ~ 1 \w 2 - v 2 ) n ~ 1 w-^ +x ~ 1 '>d n J{ w > v} 

Note that one can use the further simplification for n = 0,1,2 ... 


Kn+l/l{ z ) 


fZ e ~* v ^ n+k ^ x 

V 2x ^ 2 k (n — k)\k\ 


Details may be deduced from Barndorff-Nielsen and Shephard (2001). 

Remark 19. We mention again, that although the exact form of the density for L n or U n 
appears to be complicated, these are easily simulated using the fact that U n = T n /T. For example 
in the case of the inverse gamma distribution U n = T n G a ■ The exact representation of the density is 
of interest for instance in possible connections and interpretations to the theory of special functions. 
Lijoi and Regazzini (2004) is an example of recent work exploring the interface between special 
functions and problems arising in Bayesian nonparametrics. See also James (2005b). 


7.3.4 First passage time distribution 

The next example, taken from Bondesson (1992, p.37), involves which is a proper distribution. 
For simplicity set 9 = 1. Let 1/2 < p < 1, then T has a first passage time distribution if its moment 
generating function evaluated at A, has the form for b = 2^/p(l — p) < 1, 

1 - A - y/(l - A) 2 - b 2 
2(1 ~P) 

In this case, ( dy ) = ^(y — 1 + b) 1 ^ 2 (1 + b — y)~ 1 l 2 dy, for 1 — b < y < 1 + b. 

7.3.5 Some examples from the Dirichlet process mean functional 

As mentioned previously, when ^ is finite then Mg has the law of a Dirichlet process mean functional. 
That is, taking ^ as a distribution function, the (Vi) are iid . Due to the work of Cifarelli and 
Regazzini (1990), the law of Mg is known to often have a complex density which is not commonly 
seen in the literature. It is of course a simple matter to then obtain an expression for the distribution 
of T = GgMg. Here we state two examples. First, suppose that 9 = 1 and 1 /Vi is chosen to be a 
uniform distribution on (0,1). Then it is known that the distribution of M\ = J^(l/y)Di(dy) has 
a density given by 

Im 1 (v) = — v~ v (l — v)~ ( ' 1 ~ v ' > sin(7n;) for 0 < v < 1 

7T 

That is c V/(dy) = y~ 2 dy for y > 1. The final example may be found in Cifarelli and Mellili (2000). 
Suppose that 1/Vi is ^(1/2,1/2), that is the Arc-sine law. Then for all 6 > 0, Mg is 33(6 + 1/2, 9 + 
1 / 2 ). 
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8 Appendix 

8.1 Proof of Theorem 2.1, Propositions 2.1-2.3 


PROOF. An intial description of the posterior distribution of iVjX, follows as a simple variant 
of Theorem 3.2 in James(2005a). First note that the result in James (2005a, Theorem 3.2) holds 
obviously with h(s) in place of s. One can easily verify this by using James (2005a, Theorem 3.1). 
Then by using that result with h(s,N) := h{s)/T , it follows that the posterior distribution of N |X 
is equivalent to the distribution of the random measure N* = N + ^2^=1 Sjj, n ,Yj, where the joint 
law of N, (.7j. rl ) |X is proportional to the joint measure, 


(44) 


{T + Z]2 > h(J j ,r 


))' 


r V(dN\. 


n(p) 

') n 

j =i 


)] 3 P(dJjj) 


Note that N, T, corresponds to A n ( p ),T„( p ) and T = T + ^,^=1 Additionally P(dN\v) is a 

Poisson law with intensity v, which importantly is the same as the prior law of N. That is, under 
V(dN\i/), P(T G dt) = fT(t)dt. Now an application of the gamma identity 0 yields a posterior 
distribution of N, ( J^ n ), t/„|X proportional to, 


n(p) 

(45) u n - x e~ uf P{dN\v) [h(J hn )} e ie~ uh ^p{dJ jin \Y 0 ) 

3 =1 

The result then follows by applications of Bayes rule to <H3) . In particular notice that from 
Proposition 2.1. of James (2005a), 

e -uTjKu)w( dN \ v ) = F(dN\v u ) 


The description of appearing in Proposition 2.3 is an immediate consequence of Theorem 3.2 of 
James (2005a) combined with the gamma identity. □ 


8.2 The prediction rule 


From Propositions 2.2 and 2.3 one can derive the Bayesian prediction rule, i.e. 


P(-X"n+i £ chn+i|!X) _ 


(dX \,..., dx n - }_i) 
.//(dXdX n ) 


E[P n *(dz„ +1 )|u,X] f Un (u\X)du 


One can rewrite the predictive distribution as a linear combination of the measure 77 and of a weighted 
empirical distribution as follows 


. "(p) 

(46) P(X n+1 G dx n+1 |X) = C (n) V(dx n+ 1 ) + - V d" } S Yj (dx n+1 ) 

3 =1 

where C (n) = £ / 0 +O ° n(u\x n+1 ) f Un (u\X) d u and, for each j = 1 ,..., n(p), 

c (n) = r + °° F e ^+i ( u|T-) fuMX) du = f + °° E [h{Jj n) | U) X ] f Uniu \X) du. 

Jo T ej,„\ u \Xj) Jo 

See also James (2002) and Priinster (2002). One immediately notices that, in general, the empirical 
distribution ej^Syj /ro is no longer sufficient for prediction. This is in contrast to what happens 

with the Dirichlet process. 
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8.3 Results for weighted Poisson laws 

Suppose that g is a positive measurable function, such that, without loss of generality, E[g(iV)] = 
f M g(N)F(dN\h i ) = 1. In this section we describe what happens when P is governed by a weighted 
Poisson law P g (dN\v) oc g(N)P(dN\u). We also highlight the case where g(N) = T~ q /K[T~ q ] for 
some —oo < q < oo. 

Proposition 8.1 Suppose that P is governed hy the weighted Poisson lawP g (dN\v ) described above. 
Then it follows that the law of (N n ^ p p X, U n ) is proportional to 


n(p) 

g(7V+^d^)P(cWK) 


j =l 


n(p) 

IQ P (Jj, n £ dSj\u) 


3 =1 


^f(dX\u)f Un (u) 


where the N n ( p ) := N, and otherwise the laws above correspond to those given in Theorems 2.1 and 
3.1. An application of Bayes rule yields the relevant marginal and posterior distributions. 

We now describe an important special case of Proposition 8.1. 


Proposition 8.2 Suppose that w q = E[T -9 ] < oo for some — oo < q < oo. Furthermore assume 
that n + q > 0, then the NRM P defined by the weigthed Poisson measure oc T~ q P(dN\u), has the 
following properties. 

(i) The law of (AI„( P ), t/ n = u) is 


P (dN\u u ) 


n{p) 

Q P {Jj : n £ dsj\u ) 

3 =1 


y/(dx\u)f Un ju) 


where fu n , q { u ) = u q \T(n)/T(n + q)]w g 1 fu rl (u). 

(ii) The density fu nq corresponds to a random variable U n ^ q = T n+q /T, where r n+q is a gamma 
random variable independent of N, and T now has marginal density t~ q w~ 1 fx{t). 

(Hi) The posterior distributions of N, and hence P and p, given U niq ,iX. is the same as in Theorem 
2.1. The marginal distributions of X and p are given by f Q ..Af(dX.\u)fu nq (u)du. 


Proof. The result follows by applying the gamma identity to T ( n + q \ 


Acknowledgements We would like to thank Lennart Bondesson for a helpful conversation 
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